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Abstract 

In this paper, we present a generalized estimating equations based estima- 
tion approach and a variable selection procedure for single-index models when 
the observed data are clustered. Unlike the case of independent observations, 
bias-correction is necessary when general working correlation matrices are 
used in the estimating equations. Our variab l e sele ction procedure based on 
smooth-threshold estimating equations ( Ueki . 20091 ) can automatically elim- 



inate irrelevant parameters by setting them as zeros and is computationally 
simpler than alternative approaches based on shrinkage penalty. The result- 
ing estimator consistently identifies the significant variables in the index, even 
when the working correlation matrix is misspecified. The asymptotic property 
of the estimator is the same whether or not the nonzero parameters are known 
(in both cases we use the sa me estimating equ ations), thus achieving the or- 
acle property in the sense of Fan and Li ( 200ll ). The finite sample properties 



of the estimator are illustrated by some simulation examples, as well as a real 
data application. 

Keywords and phrases: Generalized estimating equation; Longitudinal data 
Oracle property; Single-index model; Variable selection. 
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1 Introduction 



Many data sets nowadays are characterized by two properties that make their statis- 
tical analysis complicated, high- dimensionality and dependence of observations. In 
fact, clustered data with a medium to large number of covariates are often produced 
in fields such as biology, engineering, or medicine. For different clusters 1 < ? < n, 
let Yi = (Yii, . . . , Yirrii)'^ denote the vector of outcome values, which depends on a 
p X rrii covariate matrix Xj = {Xn, . . . ,XjmJ, X^ = {Xiji, . . . ,Xijp)'^. When the 
dimension of Xij is high, it is worthwhile to spend efforts in seeking a more parsi- 
monious representation of the regression function in the hope of making estimation 
feasible for moderate sample size. Dimension reduction is one way towards this goal. 
As a popular instantiation of dimensional reduction idea, the single-index model for 
the clustered data is defined by 

yi = g(X7/3)+£„z = l,2,...,n, (1.1) 

where 
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Here (y'(-) is an unknown link function and Si is mean- zero random error with covari- 
ance matrix Var{ei) = Sj for the ith subject, and (3 = . . . , (3p)'^ is the unknown 
parameters for the index associated with covariates. Since both g and (3 are un- 
known, it is commonly assumed that || /3 ||= 1 for identifiability, where || ■ || is the 
Euclidean norm. The true value of /3 will be denoted by /Sq. Throughout this paper 
we assume that the total sample size = XliLi large (diverges to oo in our 
theoretical investigations) while {mi,i = 1, . . . ,n} are uniformly bounded. 

The popularity of the semiparametric single-index model presented above can 
be attributed to its ability to address the so-called "curse of dimensionality" prob- 
lem in multi-dimensional nonparametric regression by making use of a combina- 
tion of predictors as univariate index, which hopefully can still capture some im- 
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portant relationships between the covariates and the responses. As a dimension 
reducti on method, sin g le-index models hav e been studied exten s ively. See 



ample, 



Ichimura ( 



Naik and Tsai (2000 



igm 



More recently. 



Hardle et al. 



2001 



2004): 



1993h: 



Carroll et al. 



Yu and RuDPertl (120021) : 



1997) 



Zhu and Xud (120061) :IXia and Hardld (120061 ) : lKong and Xial (120071 ) : IWong et al.l (120081 ). 



Bai et al 



Xia et al 



Delecroix et al 



or ex- 



1999 ): 



(2003): 



(|2009[ ) studied the single-index model for longitudinal data. 



and proposed to use splines to estimate /3 and the unknown link function based on 
quadratic inference f unctio ns. Our study here is different from that work in many 
(120091 ) considered asymptotic analysis with a fixed number of 



respects. 



Bai et al. 



knots and thus their analysis is not appropriate when the true link function is not 
inside the spline space. In particular, their asymptotic analysis is only for a para- 
metric model since the number of unknown parameters does not diverge with sample 
size. Our estimation method and asymptotic analysis does not pose this constraint, 
and treat the unknown link function as a truly nonparametric component. Further- 
more, we will consider variable selection problem which was not investigated before 
for single- index models on longitudinal data. 

Even though single-index models avoid the problem of "curse of dimensionality" 
to some extent, in practice, one would still want to investigate which covariates 
are relevant for prediction, both for better interpretation of the model, and for 
better efficiency of the estimator. In recent years, penalization or shrinkage based 
variable selection methods have attracted lots of attention, due to their computa- 
tional efficiency for high- dimensional problems, and their s t atistical stability com- 



pared to information criterion based methods (IFan and lA 



2001 



Zou, 



amples of shrinkag e estimation met hods i nclud e LASSO (ITibshirani 



Fan and Li 



200ll ). Adaptive Lasso (IZoul . l2006l ). D antzig selector ( Candes and Tao 



20071 ). and many others. For single- index models. 



Naik and Tsai 



variable selection using sliced inverse regression, iKong and Xia] (120071 ) uses cross- 



20061). Ex- 



1996), SCAD 



(|20011) considers 



validation to select the significant variables, but these estimators are not expected 
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to have the oracle property (IFan and Lil . 1200 ll ). 

In this pa per, we build on th e estimating equations based approach for single- 
index models fjChang et all l2010l ). which was shown to result in a more efficient es- 
timator for the index vector, and extend it to the case where data ar e clustered. Th e 

Li et all feoiol l. 



bias-corrected estimating equations we use here were proposed in 
which focused on the construction of confidence regions of partially linear single- 
index models for longitudinal data through the empirical likelihood method. Fur- 
thermore, variable selectio n is achieved by extending the smooth-threshold estimat- 
ing equations proposed in lUekil ( 120091 ). Compared to shrinkage methods reviewed 
above, this approach dispenses with convex optimization and is thus computation- 
ally simpler. We will theoretically demonstrate the oracle property of the estimator 
as we' 



Cui et al. 



,1 as empirically illustrate its performance. We also note that recently 
(120111 ) has extended the estimating equations approach to generalized single-index 
models which do not involve clustered data. We expect that this can also be ex- 
tended to the case with variable selection for clustered data, although this is outside 
the scope of the current paper. 

The rest of the paper is organized as follows. In Section 2 we present our estima- 
tion approach for single-index models with clustered data, and in Section 3 a variable 
selection procedure based on smooth-threshold generalized estimating equations is 
presented. The oracle property for the proposed estimator is also discussed. In Sec- 
tion 4, we report some simulation studies as well as an application to a real data set. 
Our simulations show the advantage of incorporating the intra-cluster correlation in 
estimation. The proofs of theoretic results are presented in the Appendix. 



2 Bias-corrected GEE estimation 

In model ( 11. ip . we imposed \\f3\\ = 1 for identifiability, which implies that the 
parameter is not an interior point of the p-dimensional space, causing some dif- 
ficulty in inference. We use the "remove one component" method used previ- 
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ously in IYu and RuppertI (|2002|); Izhu and Xuel (120061 ): IChang et al.l (12010h . With- 
out loss of generality, we assume that for some 1 < r < p, f3r > 0. Let f3^^'^ = 
. . . , f3r-i, (3r+i, • • • , (3p)'^ be the (p — l)-dimensional parameter vector after re- 
moving the rth component of /3. Then, we may write 



Since ||/3o^''|| < 1, /?(■) is infinitely differentiable in a neighborhood of Pq' , and the 
Jacobian is 



5(0 



J, 



/3('-) 



where bs is a (p — l)-dimensional unit vector with sth component 1 for s ^ r, and 

6^^ = _(l_||/3W||2)-l/2^(r). 

Based on these notations, we construct the generalized estimating equation 
(GEE) for the single-index model with clustered data as 



^ZjR-\Y,-giXj(3))=0, 



(2.1) 



where 



,J 



and Rj,j 



r)-^jmj , 

1, . . . ,?T, are the working covariance matrices, possibly depending on 



some unknown parameter a, which can be estimated by the method of lLiang and Zeger 



(119861 ). From the estimating equations, we can see that if Rj 



with Im the 



vnj X nij identity matrix, we just ignore t he dependence o f the d ata within a cluster, 
that is, assume working independence (jhin and Carroll . 2000). For the following 
theoretical results, we do not require Rj to be the same as the true covariance Sj, 
although Rj = Hj results in the most efficient estimator. 

The estimating equation (12. ip contains the unknown functions g{-) and g'{-)- 
To solve this problem, we need to plug in some estimates for these two unknown 
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fu nctions. Here we u se the local linear regression (IFan and Gijbelsl . Il996l ). Similar 



to 



Chang et al.l ( l2010l ). for any given /3, we can estimate g{t) and g'(t) by minimizing 



n rrii 



i=i j=i 

where K is a. kernel function, Kh{-) = K{-/h)/h and h is the bandwidth. Let 
(a, 6) be the minimizers and set g{t,/3) = a and g'{t,P) = b. Simple and standard 
calculations yield the closed form expression 



i=l j=l 1=1 j=l 



where 



Unij{t,(3) 



Umj{t, (3) = Kh{Xll3 - t){{Xll3 - t)Sn,o{t, 13) - /?)}, 



and 



Sn,i{t^ /3) = - ^ - t)'KH{XlP -t), / = 0, 1, 2. 

i=i j=i 

Plugging these estimators into (12. ip . we obtain the estimating equations 



^zJi?-i(r,-g(xJ/3)) = o, 



where 



, j = 1, . . . ,n. 



(2.3) 



We can also obtain an initial estimator of /3, denoted by /3, by assum ing working 



independence. When assuming working independence, the results in IWang et al 
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(120101 ) apply with few changes, and in particular, /3 is A/n-consistent under standard 
assumptions. 

For our theoretical analysis, we will assume that Ri, . . . , Rn are prespecified and 
known. We briefly discuss the more general case where Ri must be estimated in 
Remark 1 below . However, when we do not assume that Rj,j = 1, . . . ,n, are all 
equal, similar to IWang et al.l ( 120101 ) , ( 12. 3 p leads to 



J2zjRj\Y,-giXjp)) 



where 



n 

(g'(xr/3o)JjM(Xfc - E[M'^lP^])YRk'^k 

k=l 

1 " 

\im-Y,E [(g'(Xr/3o) JjMX,)^i?-i(g'(X,^/3o) JJmX, 

n n '—^ L Po Po 



and 



k=l 



n rrH: rrn^ n 



k=l j = l 1=1 li=l l2 = l 

9 {XI^(5q)E{X^^J\XI^(5q)R'1 Sk 



•ih 
11 



■J' 



U2s{-) is the sth component of U2s{,-)i Rk is the {i,j)th element of Rf^^ , k = 1, . . . , n; i,j 
1, . . . , nik and X^^^ is the sth element of J ^^Xk. If Rk , k = 1 , . . . , n are not equal 
to each other, the arguments contained in IWang et al.l ( 2010l ) that show the term 
U2s{/3q '') is asymptotically negligible do not apply, and thus we cannot show the 



asymptotic normality of (3. Therefore , instead of G 



correction which was previously used in 
GEE 



Li et al 



]E (12.31) . we incorporate bias 



( l2010l ). leading to the bias-corrected 

n 

Y,zfRj\Y,-g{Xjp)) = 0, (2.4) 



where 



/ ^'(Xj,/3)(J5,,(X,,-E[X,i|Xj,^])f \ 



,J = 1, 



and E{Xjk\X^^P)) is a nonparametric estimate of -E'(Xjfc|Xj^/3o) with % replaced by 
the initial estimator /3, that is 

il = li2=l 

In the following, and also in the proofs in the Appendix, with misuse of notation 
but for simplicity in writing, we will write the matrix such as 

/ g\Xlii\Jl{X,,-E\X,,\XliiW \ 

simply as g'(Xj/3) jJ(Xj -E(Xj|Xj/3) and take Xj -^(Xj|Xj/3) to denote the xp 
matrix with entries Xjiq — E\Xji^X^i^(i\ i < I < rrij,! < q < p. 

Denote the solution of (12. 4p by Z?!''^ (the notations /3 and are reserved for the 
estimator based on smooth-threshold generalized estimating equations later when 
we deal with variable selection), thus our final estimator for /3 is = We 
have the following asymptotic property for 

Theorem 1. Under the regularity conditions given in the Appendix, and suppose the 



initial estimator (3 is y/n- consistent, then there exists a solution /3* of (2^) inside 
the ball B = — /3o|| ^ for C sufficiently large. Furthermore, 



v^(/3* - /3o) 



iV(0,S,) 



where 



Po Po 



The matrices V and Vt are defined in condition CI of the Appendix. 
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Remark 1. We have assumed that Ri are prespecified and known in the above. 
However, from the proof, one easily sees that when Ri is replaced by a consistent 
estimator R^, the theorem still holds. When Si = ■ ■ ■ = ] 
o/Sj is J2^=i£j£j /n, where 



a consistent estimator 



with p and (5 obtained from the working independence assumption Walan and Schiopu-Kratind . 



200 d) . Alternatively, when Ri, 1 < i < n depend on some fixed parameter a. 



moments-based r nethod can be used to es timate a consistently, resulting in consistent 
estimator of Ri ((Liang and Zegen . \l98a) . 

Remark 2. When Ri = ■ ■ ■ = i?„, = R, it is not necessary t o use bias- cor r ected GEE 
Iji2.4\ ). In particular, when using GEE Ii2.^) . Lemma A. 7 in lWang et all 12010) can 
be followed line by line (with the extra simplification that we are dealing with single- 
index models instead of partially linear single-index models in that paper) to show 



that is asymptotically normal with covariance matrix Sf, 
where 



(r), 



V, = lim -y^E\{g'{Xjf3o)J^,M^Rk\g'{^/f3o)J] 



fc=l 



'0 



It is obvious that Vi > V (i.e. Vi — V is nonnegative definite) and thus S;, < S^, 
which means estimator obtained from Ii2.3\) is more efficient than that obtained from 
/ji2.4\). However, theoretically, using bias- correction leads to simpler as sumptions on 



the ba ndwidth. In particular, unlike the theoretical results presented in lGhang et al. 



1(20 lu) . we do not need to use different bandwidths when estimating g and g' if 1^2. 4\ ) 
is used. In our simulation results, our experience is that empirically the difference 
between using ^2.3\) and \2.4^ is very small and thus we only report the simulation 
results b ased on bias - correc ted GEE only. When Ri are not all egual, the original 
proof in \Wang et all ((20101) fall through and this is the reason for proposing 112. 4\ ) 
to make our presentation much more general and work in all cases. 
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3 Variable selection and the oracle property 



So far in our discussions, all the covariates are assumed to be important for predicting 
Y. However, in many practical situations, some covariate variables are independent 
of or have negligible correlations with the response variable. As mentioned in the 
introduction, many shrinkage based approaches have been proposed in the literature 
to solve this variable selection problem, most of which are based on pen alty func- 
tions with a singularity at zero. As an alternative method, lUekil (120091 ) proposed 
smooth-threshold estimating equations (SEE). This method is easily implemented 
with Newton-Raphson type algorithms, which is almost the same as solving the 
original estimating equations under the full model. 

Let A = {1,2, ... ,p} be the index set for the components of /3. We make 
the sparsity assumption that some components of /3o are zeros and without loss of 
generality assume the first Pq components are nonzero and let Aq = |1, 2, . . . , pn} , 



and thus Ag contains all the indices of the zero components. Following lUekil (120091 ). 
we propose the following smooth-threshold generalized estimating equations (SGEE) 
for simultaneous variable selection and estimation. 



(Vi -D)J2 zfR]\Y, - g(Xj/3)) + 



0, 



(3.1) 



where 



/ g\XlP){Jl,,[X,, - E{X,,\XJMY \ 



/5i 



n; 



E{X,m,\XT^W J 



D 



\ 



5p) 
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with Si = min{l, = l,...,p,i ^ r; and (3 is the initial A/n-consistent 



estimator as before. The estimate of /3 obtained from SGEE (13.11) is denoted by /3 
and the set of estimated nonzero indices is A = {z : /3j 7^ 0}. 

From fl3.ip . we see that 6i = 1 imphes /3j = 0, while if 6i is negligibly close 
to ze r o, the n (13. ip is similar to f l2.4p . The choice 6i = mm(l, - ), proposed in 



Uekil ( 120091 ). satisfies the desired property that 6i = 1 for insignificant variables and 



negligible for significant variables, if the parameter A > is appropriately chosen. 

Theorem 2. Suppose the conditions C1-C7 in the Appendix hold, and r < pq. For 
any positive A and 7 such that n^/^A — and n^^~^"'^^'^X —t- 00 as n —)■ 00, we have: 
(i) variable selection consistency, i.e. P{A = Aq) — ?■ 1; (ii) asymptotic normality, 
i.e. n^^'^if^AQ — (3o,Aq) is asymptotically normal with mean zero and covariance matrix 
the same as when Aq is known. 

We note that in the statement of the theorem, we need to assume r < pq, that is, 
the removed component is significant. In practice, we select this component based 
on the initial estimator under the full model, and choose the component that has 
the largest absolute value. 

To use the SGEE in p ractice , we n eed to choose appropriately the tuning pa- 



rameters (A, 7). Following lUekil (120091 ). we use BlC-type criterion to choose these 



two parameters. That is, we choose (A, 7) as the minimizer of 

n 

where /3a,7 is the estimator for given (A, 7), dfx^^ is the number of estimated nonzero 
parameters. 
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4 Numerical studies 



4.1 Simulations 

In this section, we carry out some simulations to evaluate the finite sample perfor- 
mance of our proposed method. For each example below, we generate 200 data sets, 
each consisting of n = 50 or 100 subjects. For Examples 1-3, we have mk = m = 3 
observations per subject. Within a cluster, the covariance of the error is specified by 
Cov{€m',£m") = O.s'™'"™"', ui' , ui" = 1, ... ,171. FoT Example 4, we have = 1, 2, 3 
for k < n/3, n/3 < k < 2n/3, and k > 2n/3, respectively. Within a cluster, the 
covariance of the error is specified by Cov{ekm'-, ^km") = O.S'"^'"™"'!, 1 < m', m" < rrik. 
The kernel function is taken to be K{x) = |(1 — x^) if |x| < 1, otherwise, and the 
bandwidth h is selected by leave-one-out cross validation. We compare the proposed 
estimator /3 with the oracle estimator (when the zero coefficients are known), the 
estimator and also with which is the solution of SGEE (13.1 p using identity 
matrices as working covariance matrices. The following criterions are considered. 

• The square of the R statistic: = ||t|^; 

• The number of zero coefficients and nonzero coefficients obtained by different 
methods: "TN" is the average number of zero coefficients correctly estimated 
as zero, and "TP" is the number of nonzero coefficients identified as nonzero. 

In these simulations, for SGEE estimator, the common intra-cluster covariance 
matrix is estimated nonparametrically from the residuals based on the initial esti- 
mator assuming working independence. 

Example 1. Consider the single- index model for longitudinal data 

Yij = exp{X'[jl3o) +eij, i = 1, . . . , n, j = 1, . . . , 3, 

where Xij = {Xiji, . . . ,XijQ)'^ was generated from multivariate normal distribution 
with identity covariance matrix. The true parameter is /3o = -^(1,1, 0, 0, 0, 0)^. The 
numerical results are reported in Table 1. 
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n 


Method 




TN 


TP 




Oracle 


0.9982 


4 


2 






0.9895 





2 


50 




0.9935 


3.955 


2 




Pi 


0.9817 


3.525 


2 




Oracle 


0.9994 


4 


2 






0.9962 





2 


100 


p 


0.9960 


3.985 


2 




Pi 


0.9854 


3.75 


2 



Table 1: Simulation results for Example 1. 



Example 2. Similarly to Example 1 except that we let (]q = -^(1, 0.6, 0.2, 0, 0, 0)^. 
The numerical results are reported in Table 2. 

Example 3. Similar to Example 1, except we use a different link function 
g{X'^ Pq) = sm{X^l3o) which is nonmonotone. The numerical results are reported 
in Table 3. 

Example 4. Similar to Example 1, except that rrik, k = 1, . . . ,n, are different. 
The numerical results are reported in Table 4. 

Tables 1-4 show that for our three examples, SGEE can satisfactorily identify 
the true model. Besides, it is advantageous to take into account the correlation of 
the observations. 



4.2 Real data 



We now apply the proposed procedure to the CD4 data from the Multi- C enter AIDS 



Cohor t Study. T 



lis data set has been studied in 



Fan et al. 



(120071) 



Kaslow et al 



Li et al. 



mm; 



Fan and Li 



( 120101 ). The data set contains the human immun- 



odeficiency virus (HIV) status of 283 homosexual men who were infected with HIV 



during the follow-up period between 1984 and 199 
methods, and medical implications can be found in 
viduals were scheduled to have their measurements made during semiannual visits. 



Details of the stu dy design, 
jigSTI). AUindi- 



Kaslow et al. 
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n Method TN TP 

Oracle 0.9970 3 3~~ 

X 0.9840 3 

50 (3 0.9329 2.78 2.255 

Pi 0.9128 2.615 2.47 

Oracle 0.9986 3 3 

0.9925 3 

100 /3 0.9527 2.94 2.3 

i3i 0.9411 2.8 2.395 



Table 2: Simulation results for Example 2. 

n Method TN TP~ 
Oracle 0.9832 4 Y~ 
0.9171 2 

50 ^ 0.9756 3.885 2 

/3/ 0.9558 3.775 2 

Oracle 0.9928 4 2 
0.9648 2 

100 ^ 0.9912 3.94 2 

/3/ 0.9889 3.88 2 



Table 3: Simulation results for Example 3. 

n Method TN T^ 
Oracle 0.9980 4 Y~ 

0.9806 2 

50 P 0.9972 3.965 2 

0.9966 3.85 2 

Oracle 0.9996 4 2 
0.9931 2 

100 /3 0.9990 3.995 2 

/3j 0.9986 3.99 2 



Table 4: Simulation results for Example 4. 
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However, many participants missed some of their scheduled visits resulting in differ- 
ent measurement time points and unequal number of measurements per individual. 
In our analysis, we let be the CD4 cell counts for individual i at the jth visit, 
Xiji be the smoking status with 1 for a smoker and for a nonsmoker, Xij2 be the 
person's age, and Xj^a be last measured CD4 level before HIV infection. For ex- 
ploratory purposes, we also consider possible interactions of the covariates and also 
squares of Xij2 and Xys, resulting in the following model: 

Vij = 9 {xijlPi + Xij2l32 + XijsPs + xlj^Pi + X-j-g/^s 
+ XijiXij2(36 + XijiXijsf^Y + Xij2Xij-il3^ + Sij. 

We apply the SGEE approach to this data set to select significant variables and 
estimate the effects. The tuning parameters A and 7 are selected by the BIC-type 
criterion. For any individual, we assume the correlation between visits at time tj-^ 
and tj^ is a'*^i~*^2l. The fitted model is 

Vij ~ g{0A531Xiji - 0.6744x^^2 + 0.5829x^^3) 

By our variable selection procedure, we can see that only the linear terms are sig- 
nificant. 

Appendix 



In order to study the asym ptotic be^ 



lavior of the estimator, the following stan- 



dard assumptions are imposed (ILi et al 



2OIOI). 



CI. The density function fij{t) of Xfjf] is bounded away from zero and con- 
tinuously differentiable on {t : t = Xjjf3, Xy E A,i = 1, . . . ,n; j = 1, . . . , rrii) 
and A is the support of Xj, which is assumed to be compact. 



C2. The function g{-) is twice continuously differentiable, and E{Xkiq\X'^i^[3 = 
x),l < I < rrik, l<k<n,l<q<pasa. function of x is Lipschitz continuous. 
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C3. The kernel K is a. bounded, continuous and symmetric probability density 
function, satisfying 

fOO 

v?K{u)du < oo. 



C4. There exists a positive constant M, such that m&5Li<k<n,i<j<mk ^i^tj) — 
M <oo. 

C5. The bandwidth h satisfies nh^ — )■ oo, nh^ — )■ 0. 

C6. The eigenvalues of Ri and Sj are uniformly bounded and bounded away 
from zero. 



C7. n = lun^^^ i ELi E\ {g'{Xll3o)Jlr,XiyR^'ek is positive definite, 
where we use the notation = X^ — £'(X;^.|X|'/3o) is xp matrix with entries 
Xm, - E[Xki,\X^ig^], 1 < / < mfc, 1 < g < p, and E{A)^^ = EiAA^) for any 
matrix A. 



1 V^n 

itive definite. 



V = lim„^o. i ELi ^ (g'(X.^ /3o) JJ,.)X2)^^. (g'(x.^ PoUL^K 



is also pos- 



Remark. Note that in condition CI we allow the distributions of Xf.B to be 
different for different i,j, and in particular mi,i = 1, . . . ,n, are not required to be 
the same. 

Proof of The orem 1. Proof of existence of -y/ra-consistent solution to (12. 4 p is 



almost same as in 



Wang et al.l (120101 ) and omitted here. Thus we proceed to consider 



asymptotic normality. By (12. 4p . since 

n 

Y,zfR-,\Y,-g{Xl^)) = 0, 



k=l 
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it follows 



k=l 
n 

k=l 
n 

J2 ([g'(Xr/3.) - g'(X^/3o)]JjM[X, - E{X,\Xl^)]fR-'e, 



k=l 
n 



k=l 
n 



+ Yl (g'(Xr/3o)4w[Xfc - Ei^kl^lP)]) R'k'isi^lPo) - g(Xr/3.)) 

k=l 

n 

+ ils'i^l^) - g'(Xl/3o)] Jjw[Xfc - ^(Xfc|Xp)])X'(g(Xr/3o) 



k=l 

Qi 



g2(/3f^)+Q3(/3r)+Q4(/3r)+Q 



r) ^ 



(0^ 



Noting that J^ir) — J Mr) = Opin ^/^), we have 

Qi(/3l'-^)-f/(/3j'^) = o,(v^), 



where 



^(/^S'"^) = E (g'(Xr/3o)Jjw(X, - i?[X,|X^/3o]))^i?,-Vfc. 



k=l 



For (52(/3i''^), denote 



p— 1 . . . 



then 



fc=i j=i j=i 



(xr/3*)) 

(Al) 



(A.2) 



]. (A3) 
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Note that /S*, /3 G B, together with conditions C2 and C3, we have 

n rrik wife „ 



k=l j=l i=l 



d 



where /3i and /32 are the intermediate values between (3o and Thus, 

fc=i j=i i=i 



Let Q20i''^) = J'^(r)Q20*'')* ■, where the sth component of Q20^j'')* is 



Q2(/3l^)): = E E E^^^-^^"^;-[^'(^fc'^^/5o) - 9\XlP,)][X,,, - E{X,,,\XlP,)]. 

k=l j=l i=l 

By (O, let Xfci, = [Xk^s-EiXk^s\XlM] be the st/i component of Xki-E{Xk^\XlPo), 
we have 



Q2(/3i^)): 

= E E E ^''J^^^sB^i [ E E ^"'1'^ (^^"^^/^o, /3o)^7(Xj,/3o) - g'iXjM 

k = l i = l j = l li = ll2 = l 

+ EE ("^S^O' f^o)Xkjs£ljRkjj + E E (^fci^O' f^o)Xkis£ljRkij 

k=l j=l k=l jj^i 

n mi, mj. n "^ii 

+ E E E E E {xiPo, Po)XkisR 

k=l i=l j=l h^k hf^i 
'■— Q2IS + Q22S + Q23S + '524s- 



Similar to the proof of Lemma A. 4 in iLi et al.l ( l2010h . utilizing also Lemmas 
A.1-A.3 there, we can show that Q2{P*^)*s = Op{^/n) and thus 



{AA) 
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Similarly, we can obtain 



{A.5) 



For Qii/S*^), simple calculations yield 



n irik rrik 

QM'^) = E E E 9'{Xl^^)jJ,^, [X,, - E{Xu,\XlA)]Rll^ {9{Xlk) - giXlfi.)) 

k=l i=l j=l 

- E E E 9\x^^MJ^^ir> - E{x,,\xlMRki 

k=l i=l j=l 

x^7'(Xj/3o){ JJmIX,. - E{Xk^\XlPo)]V0i^^ - + o,(v^) 
It is easy to show that Q^iif^i^^) = Op{y/n) and that 

where 

V = \im-J2E [(g'(Xf /3o) Jj„XDX"'(g'(X.^/3o)^S)XDl , (A.7) 

n n ^ — ' L Po Po - 

k=l 

is a positive definite matrix. Thus 



In summary, by estimating equation (12. 4p . together with (A. 2), (A. 4), (A.5) and 
(A.8), it follows 







Thus, we have 



f/(/3j'^)) + o,(v/^)-nV(/3«-/3j^)) 

V^Wi'^ - Pt^) = V-'n-'/'UiP;;'^) + o,(l). iA.9) 



V^0* - Po) = JMr,V-'n-"w{/3i;^) + Op(l). (A.IO) 

Po 
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The asymptotic normality of /3* directly follows from this representation and the 
central limit theorem. □ 

Proof of Theorem 2. First, for j G Aq, we have = 0{n~^^'^) by the as- 

sumption of A/n-consistency of the initial estimator. Using the condition on A in the 
statement of the theorem, we get 

P(A/|^fV^"<l)^0,jG AS, (A12) 

and thus 

P{6j = 1 for all J G A^) ^ 1. 
On the other hand, we have for any e > and j G Aq — {r}, 

using that An^/^ — )■ and that IPf'^l is bounded away from zero. Thus 6j = Op{n~^/'^) 
for each j G Aq — {r}, implying trivially P{6j < 1 for all j G Aq — {r}) — ?■ 1, and 
(i) is proved. 

Next, we prove (ii). From (i) and the assumption that the rth component of /3o 
is nonzero, the SGEE coincide with 

(1 - 5,)u0^^) + = 0, for J G Ao - {r} (A.IS) 

and (3j = for j G Aq, with probability tending to one, where Uj{(3^^^) is the jth 
component of ELi zfR^\Yk - g(Xf j G Aq - {r}. Using that 5, = o,(n-V2) 
for j G Aq — {r}, it is easy to show that (A. 13) is asymptotically equivalent to 
— ^^"^ ^he asymptotic normality follows the same way as in the proof of 
Theorem 1. 
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